home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Almathera Ten Pack 2: CDPD 1
/
Almathera Ten on Ten - Disc 2: CDPD 1.iso
/
pd
/
176-200
/
183
/
pcq
/
a68k.doc
< prev
next >
Wrap
Text File
|
1995-03-13
|
19KB
|
428 lines
A68k - a freely distributable assembler for the Amiga
by Charlie Gibbs
with special thanks to
Brian R. Anderson and Jeff Lydiatt
(Version 1.2 - July 11, 1988)
Note: This program is NOT Public Domain. Permission is given
to freely distribute this program provided no fee is charged, and this
documentation file is included with the program.
This assembler is based on Brian R. Anderson's 68000 cross-
assembler published in Dr. Dobb's Journal, April through June 1986.
I have converted it to produce AmigaDOS-format object modules, and
have made many enhancements, such as macros and include files.
My first step was to convert the original Modula-2 code into C.
I did this for two reasons. First, I had access to a C compiler, but
not a Modula-2 compiler. Second, I like C better anyway.
The executable code generator code (GetObjectCode and MergeModes)
is essentially the same as in the original article, aside from its
translation into C. I have almost completely rewritten the remainder
of the code, however, in order to remove restrictions, add enhancements,
and adapt it to the AmigaDOS environment. Since the only reference book
available to me was the AmigaDOS Developer's Manual (Bantam, February
1986), the assembler and the remainder of this document work in terms
of that book.
RESTRICTIONS
Let's get these out of the way first. There are a few things that I
have not yet implemented, and some outright bugs that would take too long
to correct for this version.
o The verification file (-v) option is not supported. Diagnostic
messages always appear on the console. They also appear in the
listing file, however (see extensions below). You can produce
an error file by redirecting console output to a file - the
line number counter and final summary are displayed on stderr
so you can still see what's happening.
o The file names in the include directory list (-i) must be separated
by commas. The list may not be enclosed in quotes.
o Labels assigned by EQUR and REG directives are case-sensitive.
o The following directives are not supported, and will be flagged as
invalid op-codes:
RORG
OFFSET
NOPAGE
LLEN
PLEN
NOOBJ
FAIL
FORMAT
NOFORMAT
MASK2
I feel that NOPAGE, LLEN, and PLEN should not be defined within a
source module. It doesn't make sense to me to have to change your
program just because you want to print your listings on different
paper. The command-line option "-p" (see below) can be used as a
replacement for PLEN.
EXTENSIONS
Now for the good stuff:
o Labels can be any length that will fit onto one source line
(currently 127 bytes maximum). Since labels are stored on the
heap, the number of labels that can be processed is limited only
by available memory, which can be increased by using the "-w"
option (see below).
o Since section data and user macro definitions are stored on the
same heap as the symbol table (see above), they too are limited
only by available memory. (Actually, there is a hard-coded limit
of 32767 sections, but I doubt anyone will run into that one.)
o The only values a label cannot take are the register names - the
assembler can distinguish between the same name used as a label,
instruction name or directive, macro name, or section name.
o Section and user macro names appear in the symbol table dump, and
will also be cross-referenced. Their names can be the same as any
label (see above); the assembler can sort them out.
o Includes and macro calls can be nested indefinitely, limited only
by available memory. The message "Secondary heap overflow -
assembly terminated" will be displayed if memory is exhausted.
You can increase the size of this heap using the -w parameter
(see below). Recursive macros are supported; recursive includes
will, of course, result in a loop that will be broken only when
the heap overflows.
o The EVEN directive forces alignment on a word (2-byte) boundary.
It does the same thing as CNOP 0,2.
(This one is left over from the original code.)
o Branch (Bcc) instructions to a previously-defined label will be
automatically converted to short form if possible. This feature is
not available for forward branches, since in pass 1 the assembler
doesn't yet know how far the branch must go.
o Backward references to labels within the current CODE section
will be converted to PC relative addressing with displacement
if this mode is legal for the instruction.
o If a MOVEM instruction only specifies one register, it is converted
to the corresponding MOVE instruction. Instructions such as
MOVEM D0-D0,label will not be converted, however.
o ADD, SUB, and MOVE instructions will be converted to ADDQ, SUBQ,
and MOVEQ respectively if possible. Instructions coded explicitly
as (for example) ADDA or ADDI will not be converted.
o ADD, CMP, SUB, and MOVE to an address register are converted to
ADDA, CMPA, SUBA, and MOVEA respectively, except if an ADD, SUB,
or MOVE instruction has already been converted to quick form.
o ADD, AND, CMP, EOR, OR, and SUB of an immediate value are converted
to ADDI, ANDI, CMPI, EORI, ORI, and SUBI respectively (unless the
address register or quick conversion above has already been done).
o If both operands of a CMP instruction are postincrement mode, the
instruction is converted to CMPM.
o Operands of the form 0(An) will be treated as (An).
o The SECTION directive allows a third parameter. This can be
specified as either CHIP or FAST (upper- or lower-case). If this
parameter is present, the hunk will be written with the MEMF_CHIP
or MEMF_FAST bit set. This allows you to produce "pre-ATOMized"
object modules.
o The synonyms DATA and BSS are accepted for SECTION directives
starting data or BSS hunks. The CHIP and FAST options mentioned
above can also be used, e.g. BSS name,CHIP.
o The following synonyms have been implemented for compatibility
with the Aztec assembler:
CSEG is treated the same as CODE or SECTION name,CODE
DSEG is treated the same as DATA or SECTION name,DATA
o The ability to produce Motorola S-records is retained from the
original code. The -s option causes the assembler to produce
S-format instead of AmigaDOS format. Relocatable code cannot be
produced in this format.
o Error messages consist of three parts.
The position of the offending line is given as a line number
within the current module. If the line is within a macro expan-
sion or INCLUDE file, the position of the macro call or INCLUDE
statement in the outer module is given as well. This process
is repeated until the outermost source module is reached.
Next, the offending source line itself is listed.
Finally, the errors for that line are displayed. A flag
(^) is placed under the column where the error was detected.
o Named local labels are supported. These work the same as the
local labels supported by the Metacomco assembler (nnn$) but
can be formed in the same manner as normal labels, except that
they must be preceded by a backslash.
o The following synonyms have been implemented for compatibility
with the Assempro assembler:
ENDIF is treated the same as ENDC
= is treated the same as EQU
| is treated the same as ! (logical OR)
o Quotation marks (") can be used as string delimiters
as well as apostrophes ('). Any given string must begin
and end with the same delimiter. This allows such statements
as the following:
MOVEQ '"',D0
DC.B "This is Charlie's assembler."
Note that you can still define an apostrophe within a string
delimited by apostrophes if you double it, e.g.
MOVEQ """",D0
DC.B 'This is Charlie''s assembler.'
o If any errors are found in the assembly, the object code file
will be scratched, unless you specified the -k (keep) flag
on the command line.
o The symbol .A68K (note upper case) is automatically defined
as a SET symbol having an absolute value of 1. This enables
a source program to determine whether it is being assembled
on this assembler.
o A zeroth positional macro parameter (\0) is supported. It
is replaced by the length of the macro call (B, W, or L,
defaulting to W). For instance, given the macro:
moov MACRO
move.\0 \1,\2
ENDM
the macro call
moov.l d0,d1
would be expanded as
move.l d0,d1
HOW TO USE IT
The command-line syntax to run the assembler is as follows:
a68k <source file>
[-d]
[-e<equate file>]
[-h<header file>]
[-i<include dirlist>]
[-k]
[-l<listing file>]
[-o<object file>]
[-p<page depth>]
[-q[<quiet interval>]]
[-s]
[-t]
[-w[<primary-heap-size>][,secondary-heap-size]]
[-x<listing file>]
[-z[<debug-start-line>][,debug-end-line]]
These options can be given in any order, and the source file name can
appear before all switches, after them, or anywhere in the middle.
Option values, if any, must immediately follow the keyword with
no intervening spaces.
If the -o keyword is omitted, the object file will be given a default
name. It is created by replacing all characters after the last period in
the source file name by "o". For example, if the source file name is
"myprog.asm", the object file name defaults to "myprog.o". A source name
of "my.new.prog.asm" produces a default object file name of "my.new.prog.o".
If the source file name does not contain a period, ".o" is appended to it
to produce the default object file name.
The default value for the listing file name is arrived at in the same
way as the object file name, except that ".lst" is appended instead of ".o".
If you don't specify this parameter, no listing file will be produced.
If you specify -x (see below), -l (with the default name) is assumed,
although you can still use this parameter if you wish.
The default value for the equate file name is arrived at in the same
way as the object file name, except that ".equ" is appended instead of ".o".
The include directory list is a list of directory names separated by
commas. No embedded blanks are allowed. For example, the specification
-imylib,df1:another.lib
will cause include files to be searched for first in the current directory,
then in "mylib", then in "df1:another.lib".
The -d keyword causes symbol table entries (hunk_symbol) to be written
to the object module for the use of symbolic debuggers.
The -k keyword causes the object file to be kept if any errors were
found. Otherwise, it will be scratched if any errors occurred.
The -l keyword causes a listing file to be produced. If you want
the listing file to include a symbol table dump and cross-reference,
use the -x keyword instead (see below).
The -p keyword causes the page depth to be set to the specified value.
If omitted, a default of 60 lines (-p60) is assumed.
The -q keyword changes the interval at which A68k displays the
current line number (the default is every 10 lines, i.e. -q10). If
you specify -q0 or -q without a value, no line numbers will be displayed.
This will speed up assemblies slightly by reducing console I/O.
The -s keyword, if specified, causes the object file to be written in
Motorola S-record format. If omitted, AmigaDOS format will be produced.
The default name for an S-record file has ".s" appended to the source name,
rather than ".o"; this can still be overridden with the -o keyword, though.
The -t keyword allows tabs in the source file to be passed through
to the listing file, rather than being expanded. In addition, tabs will
be generated in the listing file to skip from the object code to the
source statement, etc. This can greatly reduce the size of the listing
file, as well as making it quicker to produce. Do not use this option
if you will be displaying or listing the list file on a device which
does not respond to a tab at every 8th position.
The -w keyword specifies the size of the heaps used. The primary heap
stores the symbol table, user macro text, relocation information, and
cross-reference information. The secondary heap stores information for
nested macro calls and include files. The primary heap size defaults to
32768 bytes, which should be enough for all but the largest assemblies.
The secondary heap size defaults to 1024 bytes, which should be enough
unless you use very deeply nested macros and/or include files with long
path names. You can specify either or both parameters. For example:
-w40000 secondary heap size remains at 1024 bytes
-w,2000 primary heap size remains at 32768 bytes
-w40000,2000 increases the size of both heaps
If you're really tight for memory, and are assembling small modules, you
can use this keyword to shrink the heaps below their default sizes.
At the end of an assembly, a message will be displayed giving the
amount of heap space actually used, in the form of the -w command
you would have to enter to allocate the mininum heap space.
See below for a layout of the heaps.
The -x keyword works the same as -l, except that a symbol table
dump, including cross-reference information, will be added to the end
of the listing file.
The -z keyword is provided for debugging purposes. You can cause
the assembler to list a range of each lines, complete with line number
and current location counter value, during both passes. For example:
-z lists all source lines
-z100,200 lists lines 100 through 200
-z100 lines all lines starting at 100
-z,100 lines the first 100 lines
If you wish to override the default object and (optionally) listing
file names, you can omit the -o and -l keywords. The assembler interprets
the first three parameters without leading hyphens as the source, object,
and listing file names respectively. Anything over three file names is an
error, as is attempting to respecify a file name with the -o or -l keywords.
The primary heap is built from both ends. Symbol table entries
(including labels) and macro text are stored during pass 1. Cross-reference
data is stored during pass 2. Relocation information is also stored during
pass 2, but is cleared at the end of each SECTION. Since it is no longer
needed once dumped, the space is freed for re-use by the next section's
relocation information. The expression parser also uses the primary heap
to store its working stacks - this space is freed as soon as an expression
has been evaluated.
The fixed portion of each symbol table entry occupies 16 bytes. The
labels and macro text occupy just enough space to hold their strings
(including the end-of-string delimiter) - they are all pointed to by fixed
symbol table entries. Relocation entries occupy 10 bytes each.
Cross-reference entries are 12 bytes long - each holds four references to
one symbol. The expression parser creates temporary entries for terms
(10 bytes each) and operators (4 bytes each). Since terms are combined
as soon as possible, the parser almost never needs to store the entire
expression on the heap.
The diagram below illustrates the layout of the primary heap. High
memory addresses are at the top of the diagram, while low addresses are
at the bottom. The names on the left of the diagram are the names of the
pointers to the various tables within the heap.
Heap + maxheap -------------> ___________________________
| |
| Symbol table |
struct SymTab *SymStart ---> |___________________________|
| |
| Symbol references |
struct Ref *RefStart -------> |___________________________|
| |
| (unused space) |
char *HeapLim --------------> |___________________________|
| |
| Relocation data |
struct RelTab *RelStart ----> |___________________________|
| |
| Labels and macro text |
char *Heap -----------------> |___________________________|
Note that the pointers are to various types. This makes for
lots of interesting casts. (Ain't C fun?) Since the relocation
data is cleared at the end of each section, HeapLim will move up and
down. The "high-water mark" is stored in char *HighHeap, which is
used solely to produce the memory usage message at the end of the
assembly. Note that a program may consist of a section containing
many relocatable references, followed by a section with fewer
relocatable references but lots of symbol references. In this case,
RefStart might end up below HighHeap, and the final message would
indicate that more heap space was used than was available. This is
not an error - only if RefStart hits HeapLim will an error be reported.
The secondary heap is also built from both ends, but it grows and
shrinks according to how many macros and include files are currently open.
At all times there will be at least one entry on the heap, for the original
source code file.
The bottom of the heap holds the names of the source code file and
any macro or include files that are currently open. The full path is
given. A null string is stored for user macros. Macro arguments are
stored by additional strings, one for each argument in the macro call line.
All strings are stored in minimum space, similar to the labels and user
macro text on the primary heap. File names are pointed to by the fixed
table entries (see below) - macro arguments are accessed by stepping past
the macro name to the desired argument, unless NARG would be exceeded.
The fixed portion of the heap is built down from the top. Each entry
occupies 16 bytes. Enough information is stored to return to the proper
position in the outer file once the current macro or include file has been
completely processed.
The diagram below illustrates the layout of the secondary heap.
Heap2 + maxheap2 -----------> ___________________________
| |
| Input file table |
struct InFCtl *InF ---------> |___________________________|
| |
| Parser operator stack |
struct OpStack *Ops --------> |___________________________|
| |
| (unused space) |
struct TermStack *Term -----> |___________________________|
| |
| Parser term stack |
char *NextFNS --------------> |___________________________|
| |
| Input file name stack |
char *Heap2 ----------------> |___________________________|
The "high-water mark" for NextFNS is stored in char *High2,
and the "low-water mark" (to stretch a metaphor) for InF is stored
in struct InFCtl *LowInF. Again, these figures are used only to
determine the maximum heap usage.
Please send me any bug reports, flames, etc. I can be reached
on Mind Link (604/533-2312), at any Panorama (PAcific NORthwest AMiga
Association) meeting, or via Jeff Lydiatt or Larry Phillips.
(I don't have the time or money to live on Usenet or CompuServe, etc.)
Charlie Gibbs
(I can't give a mailing address right now because I'm moving.)